IReNA
IReNA (Integrated Regulatory Network Analysis) is an R package to perform regulatory network analysis. IReNA contains two methods to reconstruct gene regulatory networks. The first is using single-cell RNA sequencing (scRNA-seq) data alone. The second is integrating scRNA-seq data and chromatin accessibility profiles from Assay for Transposase Accessible Chromatin using sequencing (ATAC-seq). IReNA performs modular regulatory networks to reveal key transcription factors and significant regulatory relationships among modules, providing biological insights on regulatory mechanisms.
If ATAC-seq data is available, use part 3 to refine regulatory relationships. If ATAC-seq data is not available, use part 2 to refine regulatory relationships.
For users who want to use ATAC-seq data to refine regulatory relationships, Computer or server of linux system and the following software are needed: samtools, bedtools and fimo.
1. Installation
IReNA needs R version 4.0 or higher, and Bioconductor version 3.12.
First, install Bioconductor, open R platform and run:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(version = "3.12")Next, install several Bioconductor dependencies:
BiocManager::install(c('Rsamtools', 'ChIPseeker', 'monocle',
'RcisTarget', 'RCy3', 'clusterProfiler'))Then, install IReNA from GitHub:
install.packages("devtools")
devtools::install_github("jiang-junyao/IReNA")Finally, check whether IReNA was installed correctly, restart R session and run:
library(IReNA)2. Test data download
Test data used in the following steps can be downloaded from https://github.com/jiang-junyao/IReNA-test-data. The raw data of ATAC-seq used to run ATAC-seq preprocessing pipline are available from https://www.ncbi.nlm.nih.gov/biosample?Db=biosample&DbFrom=bioproject&Cmd=Link&LinkName=bioproject_biosample&LinkReadableName=BioSample&ordinalpos=1&IdsFromResult=357084
3. Workflow
4. Inputs of IReNA
If only scRNA-seq or bulk RNA-seq data are used to run IReNA, the following inputs are needed: (2) Raw counts of scRNA-seq or bulk RNA-seq data and (3) Reference genome of the species (4) GTF file (5) Motif information of transcription factors.
If both ATAC-seq data and scRNA-seq data are used to perform IReNA, the following files are needed: (1) Bam, peak and footprint files of ATAC-seq data (2) Raw counts of scRNA-seq or bulk RNA-seq data (3) Reference genome of the species (5) Motif information of transcription factors.
(1) Bam, peak and footprint files of ATAC-seq data
Bam, peak and footprint files of ATAC-seq data can be separately generated by the step 4, step 8, and step 10 in ATAC-seq analysis pipline.
(2) Seurat object and monocle object / Raw counts of scRNA-seq or bulk RNA-seq data
For users who have analyzed seurat object and monocle object, IReNA provides function to add pseudotime from monocle object to meta data of seurat object(this funciotn only support monocle object of monocle2). Then, continue with the next steps.
###Add pseudotime to the Seurat object
seurat_with_time <- add_pseudotime(seurat_object, monocle_object)For users who only have raw counts, IReNA provides the function ‘load_counts’ to load raw counts of scRNA-seq data, and return seurat object. If the data is 10X format, set the parameter ‘datatype = 0’. If the data is normal counts format (‘txt’ as the suffix of the file name), set the parameter ‘dayatype =1’.
### load 10X counts
seurat_object <- load_counts('10X_data/sample1/', datatype = 0)
### load normal counts
seurat_object <- load_counts('test_data.txt',datatype = 1)
If bulk RNA-seq data are used to identify basic regulatory relationships, just load raw counts of bulk RNA-seq data, then use the same codes as the scRNA-seq data for further analysis. We suggest to load the expression profile of only differentially expressed genes, otherwise much more time will be used to calculate the correlation of each gene pair. Deseq2 and edgeR can be used to identify differentially expressed genes in bulk RNA-seq data.
(3) Reference genome of the species
Reference genome should be the same as that used for mapping ATAC-seq data, which can be downloaded from UCSC database.
(4) GTF file
Gene transfer format, you can download it from http://www.ensembl.org/info/data/ftp/index.html
(5) Motif information of transcription factors
IReNA contains DNA motif datasets for four species (Homo sapiens, Mus musculus, Zebrafish and Chicken) derived from TRANSFAC version 201803. Following codes are used to get the motif dataset from TRANSFAC or user-defined motif dataset which should have the same format as these from TRANSFAC database.
library(IReNA)
###call Mus musculus motif database
motif1 <- Tranfac201803_Mm_MotifTFsF
###call Homo sapiens motif database
motif1 <- Tranfac201803_Hs_MotifTFsF
###call Zebrafish motif database
motif1 <- Tranfac201803_Zf_MotifTFsF
###call Chicken motif database
motif1 <- Tranfac201803_Ch_MotifTFsF5. Tutorial
IReNA contains four main parts to reconstruct regulatory networks:
Part 1: Analyze scRNA-seq or bulk RNA-seq data to get basic regulatory relationships
Part 2: Use RcisTarget to refine regulatory relaionships
Part 3: Analyze ATAC-seq data to refine regulatory relationships (with ATAC-seq data)
Part 4: Regulatory network analysis and visualization
If ATAC-seq data is available, use part3 to refine regulatory relationships. If ATAC-seq data is not available, use part2 to refine regulatory relationships.
Part 1: Analyze scRNA-seq or bulk RNA-seq data to get basic regulatory relationships
IReNA supports three formats of the inputs:
raw counts of scRNA-seq or bulk RNA-seq data. Raw counts can be loaded through the function ‘load_counts’ in IReNA and converted to the Seurat object;
the Seurat object which contains raw counts. After the data are loaded, IReNA will calculate pseudotime using R package monocle and add pseudotime to the metadata of the Seurat object.
Seurat object with Pseudotime in the metadata
The test data can be downloaded in https://github.com/jiang-junyao/IReNA-test-data. The test data are from the study of human spermatogonial stem cell development which is available in https://www.cell.com/cell-stem-cell/fulltext/S1934-5909(17)30370-3.
Parameter ‘gene.use’ in ‘get_pseudotime’ function indicate the genes use to calculate pseudotime, if it’s null, this function will automatically use variable genes calculated by ‘FindVariableFeatures’ function in Seurat package to calculate pseudotime.
The test data for the Seurat object ‘seurat_object.rds’ can be downloaded from https://github.com/jiang-junyao/IReNA-test-data/blob/master/scRNA-seq.zip
Hint: if you input Seurat object with pseudotime, just skip this part.
###Read seurat_object
seurat_object <- readRDS('seurat_object.rds')
###calculate the pseudotime and return monocle object
monocle_object <- get_pseudotime(seurat_object,gene.use)
###Add pseudotime to the Seurat object
###This function only support monocle object from monocle2
seurat_with_time <- add_pseudotime(seurat_object, monocle_object)Next, use differentially expressed genes (DEGs) across the pseudotime to refine the seurat object, if you already have identified DEGs, you just need to run subset function in seurat:
### DEGs used here is the character class
seurat_with_time <- subset(seurat_with_time, features = DEGs)I also recommend ‘differentialGeneTest’ function in monocle to calculate DEGs across pseudotime. DEGs will be used to make expression profile in further analysis. (If you use our test data, you can skip this part, because our test data only contains differentially expressed genes)
### identify DEGs across pseudotime (qvalue < 0.05 and num_cells_expressed > 0.1)
library(monocle)
monocle_object <- detectGenes(monocle_object, min_expr = 3)
monocle_object <- estimateDispersions(monocle_object)
diff1 <- monocle::differentialGeneTest(mo,fullModelFormulaStr = "~Pseudotime",relative_expr = TRUE)
sig_genes <- subset(diff1, qval < 0.05)
sig_genes <- subset(sig_genes, num_cells_expressed > 0.1)
### Use DEGs to refine seurat object
seurat_with_time <- subset(seurat_with_time, features = rownames(sig_genes))Then, cells are divided into 50 bins across pseudotime. The bin is removed if all genes in this bin have no expression. The gene is filtered if absolute fold change < 0.01 (set by the parameter ‘FC’). Then, genes will be clustered through K-means algorithm (the number of clusters ‘K’ is set by the parameter ‘K1’).
If bulk RNA-seq data are used to identify regulatory relationships, load your expression matrix as expression_profile that generated by get_SmoothByBin_PseudotimeExp(), then run the following codes. I suggest to input expression profile that only contains differentially expressed genes, or you will a huge of time to calculate correlation of each gene pair.
###Get expression profiles ordered by pseudotime
expression_profile <- get_SmoothByBin_PseudotimeExp(seurat_with_time, Bin = 50)
###Filter noise and logFC in expression profile
expression_profile_filter <- fileter_expression_profile(expression_profile, FC=0.01)
###K-means clustering
clustering <- clustering_Kmeans(expression_profile_filter, K1=4)
clustering[1:5,1:5]## KmeansGroup FoldChangeQ95 SmExp1 SmExp2 SmExp3
## TCEB3 1 2.395806 -0.2424532 -0.8964990 -0.9124960
## CLK1 1 2.508335 -0.1819044 0.7624798 0.4867972
## MATR3 1 2.700294 -1.4485729 0.7837425 0.3028892
## AKAP11 1 2.415084 -0.6120681 -0.3849580 0.3898393
## HSF2 1 2.528111 -0.8125698 -0.6166004 0.8533309
Visualize the clustering of gene expression profiles through the heatmap
plot_kmeans_pheatmap(clustering,ModuleColor1 = c('#67C7C1','#5BA6DA','#FFBF0F','#C067A9'))Because of the characteristics of Kmeans algorithm, different results will be obtained each time clustering.
Add Ensmble ID of the genes in the first column, then calculate the correlation of each gene pair and select gene pairs which contain at least one transcription factor and have > 0.4 absolute correlation (set by the parameter ‘correlation_filter’). The value of‘correlation_filter’ depends on the noise of data, the more noisy the data, the bigger ‘correlation_filter’ is needed to get the confident correlation relationships. The correlation with p value < 0.05 is suggested to be used for the parameter‘correlation_filter’.
###Add Ensembl ID as the first column of clustering results
Kmeans_clustering_ENS <- add_ENSID(clustering, Spec1='Hs')
Kmeans_clustering_ENS[1:5,1:5]## Symbol KmeansGroup FoldChangeQ95 SmExp1 SmExp2
## ENSG00000011007 TCEB3 1 2.395806 -0.2424532 -0.8964990
## ENSG00000013441 CLK1 1 2.508335 -0.1819044 0.7624798
## ENSG00000015479 MATR3 1 2.700294 -1.4485729 0.7837425
## ENSG00000023516 AKAP11 1 2.415084 -0.6120681 -0.3849580
## ENSG00000025156 HSF2 1 2.528111 -0.8125698 -0.6166004
### Caculate the correlation
motif1 <- Tranfac201803_Hs_MotifTFsF
### for
regulatory_relationships <- get_cor(Kmeans_clustering_ENS, motif = motif1, correlation_filter = 0.7, start_column = 4)Part 2: Analyze binding motifs of target genes to refine regulatory relaionships (without ATAC-seq data)
For ATAC-seq data is not available. IReNA uses fimo to check whether motifs of transcription factors that regulate the target gene occur in the upstream of target genes. If motifs of transcription factors that regulate the target gene exist, this gene pair will be retained.
First, get TSS (transcription start site) regions (default is upstream 1000 to downstream 500) of target genes. Gtf file used here is available from http://www.ensembl.org/info/data/ftp/index.html
gtf <- read.delim("D:/Homo_sapiens.GRCh38.104.gtf", header=FALSE, comment.char="#")
gene_tss <- get_tss_region(gtf,rownames(Kmeans_clustering_ENS))head(gene_tss)## gene chr start end
## 1 ENSG00000237973 chr1 630074 631574
## 2 ENSG00000116285 chr1 8027309 8025809
## 3 ENSG00000162441 chr1 9944407 9942907
## 4 ENSG00000116273 chr1 6612731 6614231
## 5 ENSG00000177674 chr1 11735084 11736584
## 6 ENSG00000171608 chr1 9650731 9652231
Then, get the sequence of these TSS regions based on reference genome (reference genome can be download from UCSC database), and use fimo to scan these sequence to check whether the motif of transcription factor that regulates the target gene occur in the promoter regions of the target gene. Because fimo software only supports linux environment, we generate some shell script to run Fimo software.
You should set the following four parameters:
refdir: path of reference genome
fimodir: path of Fimo software. If Fimo has been set to the global environment variable, just set ‘fimodir <- fimo’.
outputdir1: output path of the shell scripts and sequence of target genes tss regions.(function ‘find_motifs_targetgenes’ will automatically generate two folders (‘fasta’ and ‘fimo’) in the path ‘outputdir1’, and store sequence of target genes tss regions in the ‘fasta’ and shell scripts in the ‘fimo’)
Motifdir: path of the motif file, which can be downloaded from https://github.com/jiang-junyao/MEMEmotif or TRANSFAC database.
Please note that, at the end of ‘outputdir1’,‘motifdir’ and ‘sequencedir’, the symbol ‘/’should be contained. What’s more, the chromosome name of your reference genome used here should be the same as the chromosome name in the gene_tss
### run the following codes in the R under linux environment
refdir='/public/home/user/genome/hg38.fa'
fimodir <- 'fimo'
outputdir1 <- '/public/home/user/fimo/'
motifdir <- '/public/home/user/fimo/Mememotif/'
find_motifs_targetgenes(gene_tss,motif1,refdir,fimodir,outputdir1,motifdir)Then run the following shell codes to activate fimo
### run the following codes in the shell
cd /public/home/user/fimo/fimo
sh fimoall.sh
Then, Fimo result are stored in the ‘fimo’ folder under outputdir1, and we run the following R codes to refine regulatory relationships
motif1 <- Tranfac201803_Hs_MotifTFsF
outputdir <- paste0(outputdir1,'fimo/')
fimo_regulation <- generate_fimo_regulation(outputdir,motif1)
filtered_regulatory_relationships <- filter_regulation_fimo(fimo_regulation, regulatory_relationships)Part 3: Analyze ATAC-seq data to refine regulatory relationships (with ATAC-seq data)
For ATAC-seq data is available, IReNA calculates the footprints of transcription factors and footprint occupancy score (FOS) to refine regulatory relationships. The footprints whose distance is less than 4 are merged and then the sequence of each footprint is obtained from the reference genome through the function ‘get_merged_fasta’. The reference genome should be in fasta/fa format which can be downloaded from UCSC or other genome database.
The footprint file used below is available in https://github.com/jiang-junyao/IReNA-test-data/blob/master/ATAC-seq.zip
The human genome used below can be downloaded from https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
###merge footprints whose distance is less than 4
filtered_footprints <- read.table('footprints.bed',sep = '\t')
fastadir <- 'Genome/hg38.fa'
merged_fasta <- get_merged_fasta(filtered_footprints,fastadir)
write.table(merged_fasta,'merged_footprints.fasta',row.names=F,quote=F,col.names=F)After obtaining the motif sequences, use fimo software to identify binding motifs in the footprints. Because fimo software only supports linux environment, we generate a shell script to run Fimo software.
First, identify differentially expressed genes related motifs through the function ‘motif_select’, which will reduce the running time of the subsequent analysis process.
Then, you should set the following five parameters:
fimodir: path of Fimo software. If Fimo has been set to the global environment variable, just set ‘fimodir <- fimo’.
outputdir1: output path of the shell scripts.
outputdir: output path of Fimo result.
motifdir: path of the motif file, which can be downloaded from https://github.com/jiang-junyao/MEMEmotif or TRANSFAC database.
sequencedir: path of the sequence which is generated by the function ‘get_merged_fasta’.
Please note that, at the end of ‘outputdir’ and ‘sequencedir’ the symbol ‘/’should be contained.
### Identify differentially expressed genes related motifs
motif1 <- motifs_select(Tranfac201803_Hs_MotifTFsF, rownames(Kmeans_clustering_ENS)) ###Kmeans_clustering_ENS was obtained in part1
### run find_motifs()
fimodir <- 'fimo'
outputdir1 <- '/public/home/user/fimo/'
outputdir <- '/public/home/user/fimo/output/'
motifdir <- '/public/home/user/fimo/Mememotif/'
sequencedir <- '/public/home/user/fimo/merged_footprints.fasta'
find_motifs(motif1,step=20,fimodir, outputdir1, outputdir, motifdir, sequencedir)Then run the following shell codes to activate fimo
### run the following commands in the shell
cd /public/home/user/fimo/
mkdir output
sh ./fimo_all.sh
Then, we combine these Fimo consequence in ‘outputdir’. Notably outputdir folder should only contain fimo result files. Next, we load the peaks file and overlap differential peaks and motif footprints through overlap_footprints_peaks() function
###Combine all footprints of motifs
combined <- combine_footprints(outputdir)
peaks <- read.delim('differential_peaks.bed')
overlapped <- overlap_footprints_peaks(combined,peaks)However, the running time of overlap_footprints() is too long, so it’s highly recommanded to use bedtools to do overlap in linux system. If you want to use bedtools to do overlap, you need to output ‘combined_footprints’ dataframe, and transfer it to shell.(If you make analysis in linux system, ignore transfer part)
### output combined_footprints
write.table(combined,'combined.txt',quote = F,row.names = F,col.names = F,sep = '\t')
### Transfer combied.txt and differential_peaks.bed to linux system, and then run the following commands in the shell
bedtools intersect -a combined.txt -b differential_peaks.bed -wa -wb > overlappd.txtNext, we intergrate bioconductor package ChIPseeker to get footprint-related genes. Before we run get_related_genes(), we need to specify TxDb, which can be download from: Bioconductor TxDb list. Kmeans_clustering_ENS used here was obtained in part1.
### If you make overlap by bedtools, read 'overlapped.txt' to R
overlapped <- read.table('overlapped.txt')
###get footprint-related genes
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
list1 <- get_related_genes(overlapped,txdb = txdb,motif=Tranfac201803_Mm_MotifTFsF,Species = 'Hs')
###Get candidate genes/TFs-related peaks
list2 <- get_related_peaks(list1,Kmeans_clustering_ENS)
### output filtered footprints
write.table(list2[[1]],'filtered_footprints.bed', quote = F, row.names = F, col.names = F, sep = '\t')Then, because of the size of original BAM file is too large, so we need to use samtools to extract footprints realated regions in BAM to reduce running time of function which analyze bam files in IReNA. (If you use our test data, just skip this step)
### transfer filtered_footprints.bed to linux system and run the following codes
samtools view -hb -L filtered_footprint.bed SSC_patient1.bam > SSC1_filter.bam
samtools view -hb -L filtered_footprint.bed SSC_patient2.bam > SSC2_filter.bam
samtools view -hb -L filtered_footprint.bed esc.bam > esc_filter.bam
Furthermore, we count the cuts of each position in footprints by wig_track(), and use these cuts to calculate the FOS of footprints to identify enriched TFs which determine the regulatory relationship. regulatory_relationships used here was calculated in part1. Parameter FOS_threshold used here is the threshold to filter low quality footprints, you can increase it to reduce the number of exported footprints.
### calculate cuts of each each position in footprints
bamfilepath1 <- 'SSC1_filter.bam'
bamfilepath2 <- 'SSC2_filter.bam'
bamfilepath3 <- 'esc_filter.bam'
cuts1 <- wig_track(bamfilepath = bamfilepath1,bedfile = list2[[1]])
cuts2 <- wig_track(bamfilepath = bamfilepath2,bedfile = list2[[1]])
cuts3 <- wig_track(bamfilepath = bamfilepath3,bedfile = list2[[1]])
wig_list <- list(cuts1,cuts2,cuts3)
### get related genes of footprints with high FOS
potential_regulation <- Footprints_FOS(wig_list,list2[[2]], FOS_threshold = 0.1)
### Use information of footprints with high FOS to refine regulatory relationships
filtered_regulatory <- filter_ATAC(potential_regulation,regulatory_relationships)Part 4: Regulatory network analysis and visualization
After we get ‘filtered_regulatory_relationships’ and ‘Kmeans_clustering_ENS’, we can reconstruct regulatory network. Run network_analysis() to get regulatory, this function will generate a list which contain the following 9 elements:
(1)Cor_TFs.txt: list of expressed TFs in the gene networks.
(2)Cor_EnTFs.txt: list of TFs which significantly regulate gene modules (or enriched TFs).
(3)FOSF_RegMTF_Cor_EnTFs.txt: regulatory pairs in which the source gene is enriched TF.
(4)FOSF_RegMTF_Cor_EnTFs.txt: regulatory pairs in which both source gene and target gene are enriched TFs.
(5)FOSF_RegMTF_Cor_EnTFs.txt: regulatory pairs only including regulations within each module but not those between modules, in this step
(6)TF_list: enriched TFs which significantly regulate gene modules
(7)TF_module_regulation: details of enriched TFs which significantly regulate gene modules
(8)TF_network: regulatory network for enriched transcription factors of each module
(9)intramodular_network: intramodular regulatory network
Here, we use refined regulatory relationships from part2 to reconstruct regulatory networks
TFs_list <- network_analysis(filtered_regulatory_relationships,Kmeans_cluster_Ens,TFFDR1 = 10,TFFDR2 = 10)We can also make enrichment analysis for differentially expressed genes in each module. Before you run this function, you need to download the org.db for your species through BiocManager.
### Download Homo sapiens org.db
#BiocManger::install('org.Hs.eg.db')
library(org.Hs.eg.db)
### Enrichment analysis (KEGG)
enrichment_KEGG <- enrich_module(Kmeans_clustering_ENS, org.Hs.eg.db, enrich.db = 'KEGG',organism = 'hsa')
#enrichment_GO <- enrich_module(Kmeans_cluster_ENS, org.Hs.eg.db, 'GO')
head(enrichment_KEGG)## ID Description module -log10(q-value) GeneRatio
## hsa03010 hsa03010 Ribosome 1 6.500237 13/101
## hsa03040 hsa03040 Spliceosome 1 1.964315 9/101
## hsa03022 hsa03022 Basal transcription factors 1 1.964315 5/101
## hsa05016 hsa05016 Huntington's disease 1 1.126355 9/101
## hsa03013 hsa03013 RNA transport 1 1.126355 8/101
## hsa05200 hsa05200 Pathways in cancer 2 4.866522 38/276
## BgRatio pvalue p.adjust qvalue
## hsa03010 92/5894 3.661614e-09 3.258836e-07 3.160551e-07
## hsa03040 128/5894 3.138210e-04 1.119399e-02 1.085638e-02
## hsa03022 37/5894 3.773255e-04 1.119399e-02 1.085638e-02
## hsa05016 183/5894 3.932733e-03 7.708051e-02 7.475579e-02
## hsa03013 152/5894 4.330366e-03 7.708051e-02 7.475579e-02
## hsa05200 327/5894 1.153409e-07 1.949262e-05 1.359809e-05
## geneID
## hsa03010 6122/6143/6206/6194/51187/6135/6161/6167/6159/6166/9045/6136/6191
## hsa03040 10992/3312/9775/10569/83443/5356/10594/8449/494115
## hsa03022 9519/2962/6877/2957/6879
## hsa05016 9519/5978/7019/51164/498/27089/54205/4512/2876
## hsa03013 11218/8761/1983/9775/5042/50628/6612/10921
## hsa05200 8312/2263/2260/6932/7188/3912/6655/331/5595/7976/5296/1021/7473/4790/329/1027/2335/6772/6654/5290/5335/2258/3845/5156/54583/5899/3688/26060/3716/836/6774/8313/5293/3913/6777/5727/5154/112398
## Count
## hsa03010 13
## hsa03040 9
## hsa03022 5
## hsa05016 9
## hsa03013 8
## hsa05200 38
Moreover, you can do GO analysis
library(org.Hs.eg.db)
### Enrichment analysis (GO)
enrichment_GO <- enrich_module(Kmeans_clustering_ENS, enrich.db = 'GO',org.Hs.eg.db)
You can visualize regulatory networks for enriched transcription factors of each module through plot_network() function by setting type parameter as ‘TF’. This plot shows regulatory relationships between transcription factors in different modules that significantly regulating other modules. The size of each vertex determine the significance of this transcription factor. Yellow edges are positive regulation, grey edges are negative regulation.
plot_tf_network(TFs_list)
You can visualize intramodular network with enriched function through plot_intramodular_network() function. Before run this function, you can select enriched functions of each module that you want to present in the plot. If you input all enriched functions, this function will automatically select the function with the highest -log10(qvalue) in each module to present in the plot. What’s more transcription factor with the most edge numbers in each module will be presented in the plot too.
### select functions that you want to present in the figure
enrichment_KEGG_select <- enrichment_KEGG[c(1,6,11,16),]
### plotting
plot_intramodular_network(TFs_list,enrichment_KEGG_select,layout = 'circle')It is strongly recommended to use Cytoscape to display the regulatory networks. We provide a function that can provide different Cytoscape styles. You need to install and open Cytoscape before running the function.
###optional: display the network in cytoscape, open cytoscape before running this function
initiate_cy(TFs_list, layout1='degree-circle', type='TF')
initiate_cy(TFs_list, layout1='grid', type='module')These are the picture we processed through Cytoscape, which can show the regulatory relationship of modularized transcription factors.
Use Cytoscape to visualize regulatory network for enriched transcription factors of each module
Use Cytoscape to visualize intramodular network
Option: Use RcisTarget to refine regulatory relaionships (without ATAC-seq data)
IReNA also provides filter_regulation function (Based on RcisTarget) to refine regulation relationships. Due to the limitations of RcisTarget, this function currently only supports three species (Hs, Mm and Fly). So if the species of your data is not included, and you don’t have ATAC-seq data, you can use unrefined regulatory relaionships to perform part4 analysis directly.
Before run this function, you need to download Gene-motif rankings database from https://resources.aertslab.org/cistarget/ and set the Rankingspath1 as the path of the downloaded Gene-motif rankings database. If you don’t know which database to choose, we suggest that using ‘hg19-500bp-upstream-7species.mc9nr’ for human, using ‘mm9-500bp-upstream-10species.mc9nr’ for mouse, using ‘dm6-5kb-upstream-full-tx-11species.mc8nr’ for fly. You can download it manually, or use R code (If you are in mainland china, i suggest you to download database from website):
### Download Gene-motif rankings database
featherURL <- "https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc9nr/gene_based/hg19-tss-centered-10kb-7species.mc9nr.feather"
download.file(featherURL, destfile=basename(featherURL)) # saved in current dir
### Refine regulatory relaionships
Rankingspath1 <- 'hg19-500bp-upstream-7species.mc9nr1.feather' # download from https://resources.aertslab.org/cistarget/
filtered_regulation <- filter_regulation(Kmeans_clustering_ENS, regulatory_relationships, 'Hs', Rankingspath1)